13 research outputs found

    DHLP 1&2: Giraph based distributed label propagation algorithms on heterogeneous drug-related networks

    Full text link
    Background and Objective: Heterogeneous complex networks are large graphs consisting of different types of nodes and edges. The knowledge extraction from these networks is complicated. Moreover, the scale of these networks is steadily increasing. Thus, scalable methods are required. Methods: In this paper, two distributed label propagation algorithms for heterogeneous networks, namely DHLP-1 and DHLP-2 have been introduced. Biological networks are one type of the heterogeneous complex networks. As a case study, we have measured the efficiency of our proposed DHLP-1 and DHLP-2 algorithms on a biological network consisting of drugs, diseases, and targets. The subject we have studied in this network is drug repositioning but our algorithms can be used as general methods for heterogeneous networks other than the biological network. Results: We compared the proposed algorithms with similar non-distributed versions of them namely MINProp and Heter-LP. The experiments revealed the good performance of the algorithms in terms of running time and accuracy.Comment: Source code available for Apache Giraph on Hadoo

    Assessing mortality prediction through different representation models based on concepts extracted from clinical notes

    Full text link
    Recent years have seen particular interest in using electronic medical records (EMRs) for secondary purposes to enhance the quality and safety of healthcare delivery. EMRs tend to contain large amounts of valuable clinical notes. Learning of embedding is a method for converting notes into a format that makes them comparable. Transformer-based representation models have recently made a great leap forward. These models are pre-trained on large online datasets to understand natural language texts effectively. The quality of a learning embedding is influenced by how clinical notes are used as input to representation models. A clinical note has several sections with different levels of information value. It is also common for healthcare providers to use different expressions for the same concept. Existing methods use clinical notes directly or with an initial preprocessing as input to representation models. However, to learn a good embedding, we identified the most essential clinical notes section. We then mapped the extracted concepts from selected sections to the standard names in the Unified Medical Language System (UMLS). We used the standard phrases corresponding to the unique concepts as input for clinical models. We performed experiments to measure the usefulness of the learned embedding vectors in the task of hospital mortality prediction on a subset of the publicly available Medical Information Mart for Intensive Care (MIMIC-III) dataset. According to the experiments, clinical transformer-based representation models produced better results with getting input generated by standard names of extracted unique concepts compared to other input formats. The best-performing models were BioBERT, PubMedBERT, and UmlsBERT, respectively

    MultiGBS: A multi-layer graph approach to biomedical summarization

    Full text link
    Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting sentences, causing the potential loss of essential information. In this study, we propose a domain-specific method that models a document as a multi-layer graph to enable multiple features of the text to be processed at the same time. The features we used in this paper are word similarity, semantic similarity, and co-reference similarity, which are modelled as three different layers. The unsupervised method selects sentences from the multi-layer graph based on the MultiRank algorithm and the number of concepts. The proposed MultiGBS algorithm employs UMLS and extracts the concepts and relationships using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation by ROUGE and BERTScore shows increased F-measure values

    A computational drug repositioning method applied to rare diseases : adrenocortical carcinoma

    Get PDF
    Rare or orphan diseases affect only small populations, thereby limiting the economic incentive for the drug development process, often resulting in a lack of progress towards treatment. Drug repositioning is a promising approach in these cases, due to its low cost. In this approach, one attempts to identify new purposes for existing drugs that have already been developed and approved for use. By applying the process of drug repositioning to identify novel treatments for rare diseases, we can overcome the lack of economic incentives and make concrete progress towards new therapies. Adrenocortical Carcinoma (ACC) is a rare disease with no practical and definitive therapeutic approach. We apply Heter-LP, a new method of drug repositioning, to suggest novel therapeutic avenues for ACC. Our analysis identifies innovative putative drug-disease, drug-target, and disease-target relationships for ACC, which include Cosyntropin (drug) and DHCR7, IGF1R, MC1R, MAP3K3, TOP2A (protein targets). When results are analyzed using all available information, a number of novel predicted associations related to ACC appear to be valid according to current knowledge. We expect the predicted relations will be useful for drug repositioning in ACC since the resulting ranked lists of drugs and protein targets can be used to expedite the necessary clinical processes

    Systems biology-derived genetic signatures of mastitis in dairy cattle : a new avenue for drug repurposing

    Get PDF
    Mastitis, a disease with high incidence worldwide, is the most prevalent and costly disease in the dairy industry. Gram-negative bacteria such as Escherichia coli (E. coli) are assumed to be among the leading agents causing acute severe infection with clinical signs. E. Coli, environmental mastitis pathogens, are the primary etiological agents of bovine mastitis in well-managed dairy farms. Response to E. Coli infection has a complex pattern affected by genetic and environmental parameters. On the other hand, the efficacy of antibiotics and/or anti-inflammatory treatment in E. coli mastitis is still a topic of scientific debate, and studies on the treatment of clinical cases show conflicting results. Unraveling the bio-signature of mastitis in dairy cattle can open new avenues for drug repurposing. In the current research, a novel, semi-supervised heterogeneous label propagation algorithm named Heter-LP, which applies both local and global network features for data integration, was used to potentially identify novel therapeutic avenues for the treatment of E. coli mastitis. Online data repositories relevant to known diseases, drugs, and gene targets, along with other specialized biological information for E. coli mastitis, including critical genes with robust bio-signatures, drugs, and related disorders, were used as input data for analysis with the Heter-LP algorithm. Our research identified novel drugs such as Glibenclamide, Ipratropium, Salbutamol, and Carbidopa as possible therapeutics that could be used against E. coli mastitis. Predicted relationships can be used by pharmaceutical scientists or veterinarians to find commercially efficacious medicines or a combination of two or more active compounds to treat this infectious disease

    A Study into patient similarity through representation learning from medical records

    Full text link
    Patient similarity assessment, which identifies patients similar to a given patient, can help improve medical care. The assessment can be performed using Electronic Medical Records (EMRs). Patient similarity measurement requires converting heterogeneous EMRs into comparable formats to calculate their distance. While versatile document representation learning methods have been developed in recent years, it is still unclear how complex EMR data should be processed to create the most useful patient representations. This study presents a new data representation method for EMRs that takes the information in clinical narratives into account. To address the limitations of previous approaches in handling complex parts of EMR data, an unsupervised method is proposed for building a patient representation, which integrates unstructured data with structured data extracted from patients' EMRs. In order to model the extracted data, we employed a tree structure that captures the temporal relations of multiple medical events from EMR. We processed clinical notes to extract symptoms, signs, and diseases using different tools such as medspaCy, MetaMap, and scispaCy and mapped entities to the Unified Medical Language System (UMLS). After creating a tree data structure, we utilized two novel relabeling methods for the non-leaf nodes of the tree to capture two temporal aspects of the extracted events. By traversing the tree, we generated a sequence that could create an embedding vector for each patient. The comprehensive evaluation of the proposed method for patient similarity and mortality prediction tasks demonstrated that our proposed model leads to lower mean squared error (MSE), higher precision, and normalized discounted cumulative gain (NDCG) relative to baselines

    A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-related genes through type-II fuzzy data fusion

    No full text
    One of the effective missions of biology and medical science is to find disease-related genes. Recent research uses gene/protein networks to find such genes. Due to false positive interactions in these networks, the results often are not accurate and reliable. Integrating multiple gene/protein networks could overcome this drawback, causing a network with fewer false positive interactions. The integration method plays a crucial role in the quality of the constructed network. In this paper, we integrate several sources to build a reliable heterogeneous network, i.e., a network that includes nodes of different types. Due to the different gene/protein sources, four gene-gene similarity networks are constructed first and integrated by applying the type-II fuzzy voter scheme. The resulting gene-gene network is linked to a disease-disease similarity network (as the outcome of integrating four sources) through a two-part disease-gene network. We propose a novel algorithm, namely random walk with restart on the heterogeneous network method with fuzzy fusion (RWRHN-FF). Through running RWRHN-FF over the heterogeneous network, disease-related genes are determined. Experimental results using the leave-one-out cross-validation indicate that RWRHN-FF outperforms existing methods. The proposed algorithm can be applied to find new genes for prostate, breast, gastric, and colon cancers. Since the RWRHN-FF algorithm converges slowly on large heterogeneous networks, we propose a parallel implementation of the RWRHN-FF algorithm on the Apache Spark platform for high-throughput and reliable network inference. Experiments run on heterogeneous networks of different sizes indicate faster convergence compared to other non-distributed modes of implementation

    A review of network-based approaches to drug repositioning

    No full text
    Experimental drug development is time-consuming, expensive and limited to a relatively small number of targets. However, recent studies show that repositioning of existing drugs can function more efficiently than de novo experimental drug development to minimize costs and risks. Previous studies have proven that network analysis is a versatile platform for this purpose, as the biological networks are used to model interactions between many different biological concepts. The present study is an attempt to review network-based methods in predicting drug targets for drug repositioning. For each method, the preferred type of data set is described, and their advantages and limitations are discussed. For each method, we seek to provide a brief description, as well as an evaluation based on its performance metrics. We conclude that integrating distinct and complementary data should be used because each type of data set reveals a unique aspect of information about an organism. We also suggest that applying a standard set of evaluation metrics and data sets would be essential in this fast-growing research domain

    Systems Biology–Derived Genetic Signatures of Mastitis in Dairy Cattle: A New Avenue for Drug Repurposing

    No full text
    Mastitis, a disease with high incidence worldwide, is the most prevalent and costly disease in the dairy industry. Gram-negative bacteria such as Escherichia coli (E. coli) are assumed to be among the leading agents causing acute severe infection with clinical signs. E. Coli, environmental mastitis pathogens, are the primary etiological agents of bovine mastitis in well-managed dairy farms. Response to E. Coli infection has a complex pattern affected by genetic and environmental parameters. On the other hand, the efficacy of antibiotics and/or anti-inflammatory treatment in E. coli mastitis is still a topic of scientific debate, and studies on the treatment of clinical cases show conflicting results. Unraveling the bio-signature of mastitis in dairy cattle can open new avenues for drug repurposing. In the current research, a novel, semi-supervised heterogeneous label propagation algorithm named Heter-LP, which applies both local and global network features for data integration, was used to potentially identify novel therapeutic avenues for the treatment of E. coli mastitis. Online data repositories relevant to known diseases, drugs, and gene targets, along with other specialized biological information for E. coli mastitis, including critical genes with robust bio-signatures, drugs, and related disorders, were used as input data for analysis with the Heter-LP algorithm. Our research identified novel drugs such as Glibenclamide, Ipratropium, Salbutamol, and Carbidopa as possible therapeutics that could be used against E. coli mastitis. Predicted relationships can be used by pharmaceutical scientists or veterinarians to find commercially efficacious medicines or a combination of two or more active compounds to treat this infectious diseaseThis article is published as Sharifi S, Lotfi Shahreza M, Pakdel A, Reecy JM, Ghadiri N, Atashi H, Motamedi M, Ebrahimie E. Systems Biology–Derived Genetic Signatures of Mastitis in Dairy Cattle: A New Avenue for Drug Repurposing. Animals. 2022; 12(1):29. https://doi.org/10.3390/ani12010029. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/)

    Heter-LP: A heterogeneous label propagation algorithm and its application in drug repositioning

    No full text
    Drug repositioning offers an effective solution to drug discovery, saving both time and resources by finding new indications for existing drugs. Typically, a drug takes effect via its protein targets in the cell. As a result, it is necessary for drug development studies to conduct an investigation into the interrelationships of drugs, protein targets, and diseases. Although previous studies have made a strong case for the effectiveness of integrative network-based methods for predicting these interrelationships, little progress has been achieved in this regard within drug repositioning research. Moreover, the interactions of new drugs and targets (lacking any known targets and drugs, respectively) cannot be accurately predicted by most established methods. In this paper, we propose a novel semi-supervised heterogeneous label propagation algorithm named Heter-LP, which applies both local and global network features for data integration. To predict drug-target, disease-target, and drug-disease associations, we use information about drugs, diseases, and targets as collected from multiple sources at different levels. Our algorithm integrates these various types of data into a heterogeneous network and implements a label propagation algorithm to find new interactions. Statistical analyses of 10-fold cross-validation results and experimental analyses support the effectiveness o
    corecore